| cohort | count |
|---|---|
| control | 1051 |
| ibis | 117 |
Zer0 inflated Models for Inpatient Admissions
We consider Unicare - Study vs all other members, using cohort, as well as select other covariates. Even in the event ignorability, balance, and other assumptions, they often result in reduced bias and efficiency (lower variance) of estimates.
Regression Models
Negative binomial regression; intercept only and with
cohortas covariateZero inflated Poisson and negative binomial;
cohortand select other covariates on count and zero inflation portions of the modelBayesian versions of the above
Modeling Hospital Admissions counts
We consider admissions occurring within 270 day window of observation.
Admissions counts
Overall
| mean | variance |
|---|---|
| 0.512 | 1.39 |
By cohort
| cohort | mean | variance |
|---|---|---|
| control | 0.549 | 1.511 |
| ibis | 0.179 | 0.218 |
With overall variance not equal to the mean, a Poisson model is not appropriate. We consider a negative binomial model, which allows for overdispersion. But note that the dispersion parameter is a global parameter, and not adjusted for cohorts separately. That said, as it is a function of the mean, the variance given by the model will be adjusted for cohort.
Covariates, multicolinearity
Perhaps surprisingly, age and chronic conditions covariates do not exhibit high multicolinearity. This means not only that there are no high pairwise correlations, but that, for example, any one of them is not explainable by the others- regressing any one on the others would results in a low R squared value.
As an illustration, with condition count included, the covariates are completely colinear.
Low multicolinearity is a desirable feature for generalized linear models as it results in more stable coefficient estimates.
Condition number
We find the condition number of the model matrix with age and other covariates. This the square root of the ratio of max and min eigen values of the correlation matrix, and is thus a ratio of the standard deviations along components of the data with max variation and min variation,
\(\sqrt{\frac{\lambda_{max}}{\lambda_{min}}}\) = 2.19
As a rule of thumb, condition number > 30 is indication of high multicolinearity.
Correlations
Selecting covariates with low multicolinearity is also more straightforward, as simply selecting those most highly correlated with the outcome is more justifiable than in the case of high multicolinearity.
| condition | correlation |
|---|---|
| atrial_fibrillation | 0.2337 |
| age | 0.1525 |
| chf | 0.1354 |
| copd | 0.0793 |
| lung_cancer | 0.0770 |
| kidney_disease | 0.0651 |
| urologic_cancer | 0.0635 |
| anemia | 0.0570 |
| chd | 0.0539 |
| dementia | 0.0506 |
| endometrial_cancer | 0.0502 |
| parkinsons | 0.0461 |
| alzheimers | 0.0415 |
| hip_fracture | 0.0316 |
| pneumonia | 0.0223 |
| heart_attack | 0.0192 |
| hypothyroidism | 0.0179 |
| mild_cognitive_impairment | 0.0091 |
| hypertension | 0.0070 |
| stroke | 0.0033 |
| hyperplasia | 0.0027 |
| diabetes | -0.0031 |
| glaucoma | -0.0108 |
| asthma | -0.0150 |
| breast_cancer | -0.0157 |
| cataract | -0.0218 |
| anxiety | -0.0242 |
| osteoporosis | -0.0249 |
| depression | -0.0263 |
| arthritis | -0.0385 |
| prostate_cancer | -0.0440 |
| hyperlipidemia | -0.0493 |
| colorectal_cancer | -0.0582 |
We select age, atrial_fibrillation, and chf, variously in the models below.
Admissions counts; negative binomial
We first compare intercept only and cohort models
\[ \begin{aligned} {\rm{count}}_i & \sim NB(\mu_i, \theta) \\ \log \mu_i & = \beta_0 \end{aligned} \]
| term | estimate | p.value | exp_estimate |
|---|---|---|---|
| (Intercept) | -0.669 | 0 | 0.512 |
\(\theta\): 0.319
\[ \begin{aligned} {\rm{count}}_i & \sim NB(\mu_i, \theta) \\ \log \mu_i & = \beta_0 + \beta_1 \textrm{cohort}_i \end{aligned} \]
| term | estimate | p.value | exp_estimate |
|---|---|---|---|
| (Intercept) | -0.60 | 0 | 0.549 |
| cohortibis | -1.12 | 0 | 0.327 |
\(\theta\): 0.336
Model comparison
Negative binomial with vs without cohort as predictor
Likelihood ratio test, ANOVA, p-value on coefficient all equivalent
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -0.60 | 0.068 | -8.87 | 0 |
| cohortibis | -1.12 | 0.279 | -4.01 | 0 |
Predictive check, NB model for admissions counts
Model
\[\textrm{counts}_i \sim NB(\mu_i, \theta)\] \[\log \mu_i = \beta_0 + \beta_1 \textrm{cohort}\]
Observed vs predicted admissions count proportions; negative binomial
for the control cohort
| inpatient_count | frequency | prevalence | expected | prob |
|---|---|---|---|---|
| 0 | 761 | 0.725 | 757.699 | 0.722 |
| 1 | 151 | 0.144 | 157.845 | 0.150 |
| 2 | 71 | 0.068 | 65.416 | 0.062 |
| 3 | 35 | 0.033 | 31.605 | 0.030 |
| 4 | 19 | 0.018 | 16.355 | 0.016 |
| 5 | 4 | 0.004 | 8.801 | 0.008 |
| 6 | 4 | 0.004 | 4.857 | 0.005 |
| 7 | 1 | 0.001 | 2.728 | 0.003 |
| 8 | 0 | 0.000 | 1.552 | 0.001 |
| 9 | 3 | 0.003 | 0.892 | 0.001 |
| 10 | 0 | 0.000 | 0.517 | 0.000 |
For the ibis cohort
| inpatient_count | frequency | prevalence | expected | prob |
|---|---|---|---|---|
| 0 | 100 | 0.855 | 101.331 | 0.866 |
| 1 | 13 | 0.111 | 11.851 | 0.101 |
| 2 | 4 | 0.034 | 2.757 | 0.024 |
| 3 | 0 | 0.000 | 0.748 | 0.006 |
| 4 | 0 | 0.000 | 0.217 | 0.002 |
| 5 | 0 | 0.000 | 0.066 | 0.001 |
| 6 | 0 | 0.000 | 0.020 | 0.000 |
| 7 | 0 | 0.000 | 0.006 | 0.000 |
| 8 | 0 | 0.000 | 0.002 | 0.000 |
| 9 | 0 | 0.000 | 0.001 | 0.000 |
| 10 | 0 | 0.000 | 0.000 | 0.000 |
Zero inflated negative binomial
With cohort as covariate on both zero inflation and count portions of the model
This example for illustration only
\[ \begin{aligned} \textrm{counts}_i &\sim \text{ZINB}(\pi_i, \mu_i, \theta) \\ \log\left( \frac{\pi_i}{1 - \pi_i} \right) &= \gamma_0 + \gamma_1 \, \textrm{cohort}_i \\ \log(\mu_i) &= \beta_0 + \beta_1 \, \textrm{cohort}_i \end{aligned} \]
| term | estimate |
|---|---|
| count_(Intercept) | -0.335 |
| count_cohortibis | -1.382 |
| zero_(Intercept) | -1.193 |
| zero_cohortibis | -6.505 |
\(\theta\): 0.505
This example for illustration only The coeffs are not significant, but see models below.
Zero inflation probabilities
control: \(logit^{-1}\) -1.193 = 0.233
ibis: \(logit^{-1}\) (-1.193 + -6.505) = 4.532^{-4}
Count mean for ibis also reduced by factor of exp(-1.382) = 0.251.
Observed vs expected admissions counts; zero inflated negative binomial
for the control cohort
| inpatient_count | frequency | prevalence | expected | prob |
|---|---|---|---|---|
| 0 | 761 | 0.725 | 759.552 | 0.724 |
| 1 | 151 | 0.144 | 152.621 | 0.145 |
| 2 | 71 | 0.068 | 67.325 | 0.064 |
| 3 | 35 | 0.033 | 32.954 | 0.031 |
| 4 | 19 | 0.018 | 16.927 | 0.016 |
| 5 | 4 | 0.004 | 8.940 | 0.009 |
| 6 | 4 | 0.004 | 4.808 | 0.005 |
| 7 | 1 | 0.001 | 2.619 | 0.002 |
| 8 | 0 | 0.000 | 1.440 | 0.001 |
| 9 | 3 | 0.003 | 0.798 | 0.001 |
| 10 | 0 | 0.000 | 0.445 | 0.000 |
For the ibis cohort
| inpatient_count | frequency | prevalence | expected | prob |
|---|---|---|---|---|
| 0 | 100 | 0.855 | 100.345 | 0.858 |
| 1 | 13 | 0.111 | 13.286 | 0.114 |
| 2 | 4 | 0.034 | 2.622 | 0.022 |
| 3 | 0 | 0.000 | 0.574 | 0.005 |
| 4 | 0 | 0.000 | 0.132 | 0.001 |
| 5 | 0 | 0.000 | 0.031 | 0.000 |
| 6 | 0 | 0.000 | 0.008 | 0.000 |
| 7 | 0 | 0.000 | 0.002 | 0.000 |
| 8 | 0 | 0.000 | 0.000 | 0.000 |
| 9 | 0 | 0.000 | 0.000 | 0.000 |
| 10 | 0 | 0.000 | 0.000 | 0.000 |
More models; summary and comparisons
Below we fit more models and summarize eight models. All the models use cohort, chf, and atrial_fibrillation as covariates.
zero inflated vs not
poisson vs negative binomial distribution
with and without
ageas a covariate.
Zero inflated age
Two of the models include age as a covariate for the zero inflation part of the model, as well as cohort, chf, and atrial_fibrillation on the count portions. We highlight those here. The full summary of all the models follows.
Zero inflated Poisson with age zero inflation
| term | component | estimate | p.value |
|---|---|---|---|
| (Intercept) | count | 1.3113 | 0.0001 |
| cohortibis | count | -1.1633 | 0.0000 |
| age | count | -0.0145 | 0.0019 |
| chf | count | 0.1838 | 0.1292 |
| atrial_fibrillation | count | 0.3695 | 0.0006 |
| (Intercept) | zero | 3.8854 | 0.0000 |
| age | zero | -0.0465 | 0.0000 |
Zero inflated negative binomial with age zero inflation
| term | component | estimate | p.value |
|---|---|---|---|
| (Intercept) | count | 0.5769 | 0.2944 |
| cohortibis | count | -1.1460 | 0.0000 |
| age | count | -0.0136 | 0.0463 |
| chf | count | 0.3708 | 0.0424 |
| atrial_fibrillation | count | 0.6520 | 0.0000 |
| Log(theta) | count | -0.2263 | 0.4097 |
| (Intercept) | zero | 4.0004 | 0.0000 |
| age | zero | -0.0655 | 0.0000 |
Coefficients for all models
Poisson and negative binomial; no zero inflation
| model | (Intercept) | cohortibis | chf | atrial_fibrillation | age |
|---|---|---|---|---|---|
| Pois1 | -0.878 | -1.12 | 0.395 | 0.760 | NA |
| Pois2 | -1.533 | -1.14 | 0.390 | 0.687 | 0.0093 |
| NB1 | -0.883 | -1.06 | 0.421 | 0.753 | NA |
| NB | -1.329 | -1.08 | 0.412 | 0.695 | 0.0064 |
Poisson and negative binomial; no zero inflation - long format with p-values
| term | estimate | pval | model |
|---|---|---|---|
| (Intercept) | -0.8781 | 0.0000 | Pois1 |
| cohortibis | -1.1217 | 0.0000 | Pois1 |
| chf | 0.3946 | 0.0003 | Pois1 |
| atrial_fibrillation | 0.7603 | 0.0000 | Pois1 |
| (Intercept) | -1.5330 | 0.0000 | Pois2 |
| cohortibis | -1.1447 | 0.0000 | Pois2 |
| age | 0.0093 | 0.0159 | Pois2 |
| chf | 0.3895 | 0.0004 | Pois2 |
| atrial_fibrillation | 0.6868 | 0.0000 | Pois2 |
| (Intercept) | -0.8831 | 0.0000 | NB1 |
| cohortibis | -1.0646 | 0.0001 | NB1 |
| chf | 0.4207 | 0.0324 | NB1 |
| atrial_fibrillation | 0.7527 | 0.0000 | NB1 |
| (Intercept) | -1.3289 | 0.0016 | NB |
| cohortibis | -1.0830 | 0.0001 | NB |
| age | 0.0064 | 0.2794 | NB |
| chf | 0.4122 | 0.0359 | NB |
| atrial_fibrillation | 0.6946 | 0.0000 | NB |
Zero inflated Poisson and negative binomial.
| term | component | estimate | p.value | model |
|---|---|---|---|---|
| (Intercept) | count | 0.2615 | 0.0013 | ZIP1 |
| cohortibis | count | -1.0836 | 0.0000 | ZIP1 |
| chf | count | 0.1673 | 0.1680 | ZIP1 |
| atrial_fibrillation | count | 0.3072 | 0.0030 | ZIP1 |
| (Intercept) | zero | 0.5336 | 0.0000 | ZIP1 |
| (Intercept) | count | -0.8830 | 0.0000 | ZINB1 |
| cohortibis | count | -1.0647 | 0.0001 | ZINB1 |
| chf | count | 0.4207 | 0.0314 | ZINB1 |
| atrial_fibrillation | count | 0.7527 | 0.0000 | ZINB1 |
| Log(theta) | count | -0.9474 | 0.0000 | ZINB1 |
| (Intercept) | zero | -9.8505 | 0.8728 | ZINB1 |
| (Intercept) | count | 0.4418 | 0.2044 | ZIP2 |
| cohortibis | count | -1.0793 | 0.0000 | ZIP2 |
| age | count | -0.0025 | 0.5945 | ZIP2 |
| chf | count | 0.1697 | 0.1623 | ZIP2 |
| atrial_fibrillation | count | 0.3238 | 0.0027 | ZIP2 |
| (Intercept) | zero | 0.5405 | 0.0000 | ZIP2 |
| (Intercept) | count | -1.3288 | 0.0011 | ZINB2 |
| cohortibis | count | -1.0830 | 0.0001 | ZINB2 |
| age | count | 0.0064 | 0.2659 | ZINB2 |
| chf | count | 0.4122 | 0.0347 | ZINB2 |
| atrial_fibrillation | count | 0.6947 | 0.0000 | ZINB2 |
| Log(theta) | count | -0.9405 | 0.0000 | ZINB2 |
| (Intercept) | zero | -10.0586 | 0.8747 | ZINB2 |
| (Intercept) | count | 1.3113 | 0.0001 | ZIP3 |
| cohortibis | count | -1.1633 | 0.0000 | ZIP3 |
| age | count | -0.0145 | 0.0019 | ZIP3 |
| chf | count | 0.1838 | 0.1292 | ZIP3 |
| atrial_fibrillation | count | 0.3695 | 0.0006 | ZIP3 |
| (Intercept) | zero | 3.8854 | 0.0000 | ZIP3 |
| age | zero | -0.0465 | 0.0000 | ZIP3 |
| (Intercept) | count | 0.5769 | 0.2944 | ZINB3 |
| cohortibis | count | -1.1460 | 0.0000 | ZINB3 |
| age | count | -0.0136 | 0.0463 | ZINB3 |
| chf | count | 0.3708 | 0.0424 | ZINB3 |
| atrial_fibrillation | count | 0.6520 | 0.0000 | ZINB3 |
| Log(theta) | count | -0.2263 | 0.4097 | ZINB3 |
| (Intercept) | zero | 4.0004 | 0.0000 | ZINB3 |
| age | zero | -0.0655 | 0.0000 | ZINB3 |
Fit
We can use; eg, chi-square test on the log likelihood to check whether the difference in models is statistically significant. See the above tables to infer the covariates and type of model for each model name.
| model | logLik | df | AIC | BIC |
|---|---|---|---|---|
| Pois1 | -1240 | 4 | 2487 | 2507 |
| Pois2 | -1237 | 5 | 2483 | 2509 |
| NB1 | -1059 | 5 | 2127 | 2153 |
| NB2 | -1058 | 6 | 2128 | 2158 |
| ZIP1 | -1106 | 5 | 2222 | 2248 |
| ZINB1 | -1059 | 6 | 2129 | 2160 |
| ZIP2 | -1106 | 6 | 2224 | 2254 |
| ZINB2 | -1058 | 7 | 2130 | 2165 |
| ZIP3 | -1087 | 7 | 2188 | 2223 |
| ZINB3 | -1049 | 8 | 2115 | 2155 |
Bayesian zero inflated Poisson regression
Bayesian models often offer some advantages. We get an entire distribution for every parameter modeled-not just point estimates and confidence intervals. We can also get distributions for any statistic.
We first consider a zero inflated model with intercept-only zero inflation portion, for quick comparison with the analogous frequentist model. We see that results are consistent, and in particular the credible intervals for the coefficients are bounded away from zero.
Bayes intercept only zero inflation, no age covariate
\[ \begin{aligned} \textrm{counts}_i & \sim ZIP(\pi_i, \mu_i) \\ \log \mu_i & = \beta_0 + \beta_1 \textrm{cohort}_i + \beta_2\textrm{chf}_i + \beta_3\textrm{atrialfibrillation}_i\\ \log \frac{\pi_i}{1 - \pi_i} & = \gamma_0 \\ \end{aligned} \]
Model results; compare frequentist ZIP model
Bayes
| component | term | Estimate | Est.Error | Q2.5 | Q97.5 |
|---|---|---|---|---|---|
| count | Intercept | 0.278 | 0.079 | 0.119 | 0.432 |
| zero | zi_Intercept | 0.587 | 0.090 | 0.407 | 0.761 |
| count | cohortibis | -0.867 | 0.219 | -1.302 | -0.444 |
| count | chf | 0.159 | 0.119 | -0.077 | 0.386 |
| count | atrial_fibrillation | 0.283 | 0.100 | 0.085 | 0.480 |
Frequentist
| component | term | Estimate | Std. Error |
|---|---|---|---|
| count | (Intercept) | 0.262 | 0.081 |
| count | cohortibis | -1.084 | 0.254 |
| count | chf | 0.167 | 0.121 |
| count | atrial_fibrillation | 0.307 | 0.103 |
| zero | (Intercept) | 0.541 | 0.094 |
Posterior distributions
100 draws from the posterior predictive distribution
Post predictive checks
We can check how the distribution of proportions for different values of inpatient_count compares with those in the data.
Bayes zero inflated with age covariate for count and zero inflation portions of the model
\[
\begin{aligned}
\textrm{counts}_i & \sim ZIP(\pi_i, \mu_i) \\
\log \mu_i & = \beta_0 + +\beta_1\textrm{cohort}_i + \beta_2\textrm{age}_i + \beta_3\textrm{chf}_i + \beta_4\textrm{atrialfibrillation}_i\\
\log \frac{\pi_i}{1 - \pi_i} & = \gamma_0 + \gamma_1\textrm{age}_i \\
\end{aligned}
\] Note that for this model we scale the age covariate
\[ age \rightarrow \frac{\textrm{age} - 60}{10} \]
for interpretability and for numerical stability. This affects the value of the coefficient estimates, but not the model fit, significance, or effect on the outcome.
Posterior estimates, credible intervals
| component | term | Estimate | Est.Error | Q2.5 | Q97.5 |
|---|---|---|---|---|---|
| count | Intercept | 0.442 | 0.087 | 0.268 | 0.607 |
| zero | zi_Intercept | 1.079 | 0.121 | 0.843 | 1.318 |
| count | cohortibis | -0.929 | 0.218 | -1.367 | -0.503 |
| count | age | -0.126 | 0.046 | -0.214 | -0.034 |
| count | chf | 0.170 | 0.120 | -0.069 | 0.400 |
| count | atrial_fibrillation | 0.335 | 0.106 | 0.128 | 0.542 |
| zero | zi_age | -0.398 | 0.072 | -0.541 | -0.259 |
100 draws from the posterior predictive distribution
Post predictive checks
We can check how the distribution of proportions for different values of inpatient_count compares with those in the data.
Math notes
Linear models
Continuous response \(Y\) (eg, heart rate) as a function of predictors; aka covariates, \(X_j\) (eg, cholesterol, smoker)
\[Y \sim N(\mu , \sigma^2)\] where the conditional mean is a linear function of the predictors
\[\mu = E(Y |X ) = \beta_0 + \sum_{j=1}^p \beta_j X_j = X \boldsymbol{\beta}\]
\(\beta_j\) are estimated by minimizing the sum of squares errors, \[\sum_i (y_i - (\beta_0 - \sum_j \beta_j x_{ij}))^2 = ||\mathbf{y} - \mathbf{X}\boldsymbol{\beta}||^2_2\]
We can interpret the coefficients directly as effect sizes.
Generalized linear models
We model a function of the conditional mean as a linear function
\[g(\mu ) = \beta_0 + \sum_{j=1}^p \beta_j X_j = X \boldsymbol{\beta}\]
\(g\) is called a link function.
Example: Logistic regression
\[Y \sim \textrm{Bernoulli}(\mu )\] Coin toss with probability \(\mu\) of heads, \(1 - \mu\) of tails.
\[g(\mu ) = \rm{logit} \ \mu := \log \left(\frac{\mu}{1 - \mu}\right) = X \boldsymbol{\beta}\]
\[\mu = \rm{logit}^{-1}(X \boldsymbol{\beta})\]
Likelihood
To find \(\beta_j\), we typically maximize the (log) likelihood function.
For logistic regression, with \(n\) independent observations \(y_i =\) 0 or 1, the joint probability is
\[p({\bf{y}} | X_i) = \mu_i^{y_i} (1 - \mu_i)^{1 - y_i}\]
The log likelihood is then
\[logLik(\boldsymbol{\beta}) = \sum_{i=1}^n y_i \log(\mu_i) + (1 - y_i)\log (1 - \mu_i)\]
Where are the \(\beta\)’s?
Recall that \(\mu_i = \rm{logit}^{-1}(X_i \boldsymbol{\beta})\))
Poisson regression
Counts \(Y\) modeled as
\[ p(Y = y; \mu) = \frac{\mu^y e^{-\mu }}{y!} \] with “canonical” link
\[ g(\mu ) = \log(\mu ) = X \boldsymbol{\beta} \]
Likelihood
\[logLik(\boldsymbol{\boldsymbol{\beta}}) = \sum_{i=1}^n \left( y_i \log \mu_i - \mu_i - \log (y_i!) \right)\]
Fun fact
For a linear model, the minimum sum of squares estimate is the same as the maximum likelihood estimate.
Models
Zero inflated Poisson regression
Zero inflated negative binomial regression
Bayes versions of the above
Negative binomial distribution
Often used on overdispersed data: \(\mu < \sigma^2\).
\[p(Y = y; \mu, \theta) = \binom{y + \theta - 1}{\theta} \left(\frac{\mu}{\mu + \theta}\right)^y \left(\frac{\theta}{\mu + \theta}\right)^\theta\]
Dispersion parameter \(\theta > 0\); fixed- does not vary with covariates like \(\mu\).
\(\theta\) is estimated by the model
Conditional mean \(E(Y|X) = \mu\). We again model with log link
\[\log \mu = X \boldsymbol{\beta}\]
Variance
\[Var(Y|X) = \mu + \mu^2/\theta\]
\(\theta\) part of the model fit - unlike \(\sigma^2\) in linear regression.
Zero inflated Poisson regression
Define \(Z \sim Bernoulli(\pi)\) - a coin flip with probability \(\pi\) of heads.
\[ Y = \begin{cases} 0 & \text{if} \ Z = 1 \\ \sim Pois(\mu) & \text{if} \ Z = 0 \end{cases} \]
\(Z\) models the zero-inflated portion - the “extra” zeros.
The additional count data, including zeros, is modeled with the Poisson distribution with mean \(\mu\).
\[ p(Y = y; \mu, \pi) = \begin{cases} \pi + (1 - \pi) e^{-\mu} & \text{if } y = 0 \\ (1 - \pi) \frac{\mu^y e^{-\mu}}{y!} & \text{if } y > 0 \end{cases} \]
(Recall Poisson distribution is \(P(y; \mu) = \mu^y e^{-\mu}/y!\))
\(\log \frac{\pi}{1 -\pi} =\) linear function of some covariates
\(\log(\mu) =\) linear function of some covariates
Likelihood function
We again estimate the coefficients using log likelihood.
\[ \begin{align*} logLik(\mathbf{\beta}, \mathbf{\gamma}) = \sum_{i=1}^n \Big(& \mathbf{1}_{\{y_i = 0\}} \cdot \log \left( \pi_i + (1 - \pi_i) e^{-\mu_i} \right) \\ & + \mathbf{1}_{\{y_i > 0\}} \cdot \left( \log(1 - \pi_i) + y_i \log(\mu_i) - \mu_i - \log(y_i!) \right) \Big) \end{align*} \]
Bayesian statistical modeling
Starts with Bayes’ Rule
The conditional probability of \(B_k\) given \(A\) is
\[\begin{aligned} P(B_k \ | \ A) & = \frac{P(A \ | \ B_k) P(B_k)}{P(A)} \\ & = \frac{P(A \ | \ B_k) P(B_k)}{\sum_i P(A \ | \ B_i) P(B_i)} \end{aligned}\]
where \(A\) and \(B_j\) are events with \(B_j\) a partition of the sample space. The vertical bar indicates conditional probability
The standard disease/positive test example
\[P(D | +) = \frac{P(+ | D) \ P(D)}{P(+ | D) P(D) + P(+ | \bar{D}) P(\bar{D})} \]
Bayes’ Rule for probability densities for model parameters
\[p(\boldsymbol{\theta} | y, X) = \frac{p(\boldsymbol{\theta}) p(y | \boldsymbol{\theta}, X)}{\int p(y | \boldsymbol{\theta}, X) p(\boldsymbol{\theta}) d\boldsymbol{\theta}}\]
random vector of model parameters \(\boldsymbol{\theta}\) (our \(\beta\)’s and \(\gamma\)’s)
fixed predictor data \(X\) and response data \(y\)
\(p(\boldsymbol{\theta})\) is the prior distribution for the parameters
\(p(y | \boldsymbol{\theta}, X)\) is the likelihood of the data given the parameters \(\theta\) - same as the likelihood function in frequentist statistics.
\(p(\theta | x)\) is the posterior distribution for the parameters \(\theta\).
The denominator is an integration over the parameter space; typically intractable \(\Rightarrow\) Markov Chain Monte Carlo (MCMC) methods to approximate the posterior distribution.
Posterior predictive distribution
Once you have the \(\theta\) posterior, you can compute the posterior predictive distributions of a new observation \(x_0\), given the data \(X\).
\[\begin{align*} p(x_0|X) & = \int p(x_0, \theta | X)\, d\theta \\ \\ & = \int p(x_0| \theta, X) \, p(\theta |X) , d\theta \\ \\ & = \int p(x_0|\theta) \, p(\theta | X) \, d\theta\\ \end{align*}\]
Can also get a distribution for any statistic of interest.
Bayesian inference
Parameters are regarded as random variables, data fixed
We get an entire distribution for every parameter, and hence for any statistic we wish to compute
No p-values, no hypothesis testing!
Example
\(\pi\) is the probability of heads in a single coin toss,
data is the sequence of heads and tails in 100 tosses
By contrast, our maximum likelihood estimate of \(\pi\) would be the peak of the likelihood, which is the same as the posterior for flat prior.